Self-Learning for Few-Shot Remote Sensing Image Captioning

نویسندگان

چکیده

Large-scale caption-labeled remote sensing image samples are expensive to acquire, and the training available in practical application scenarios generally limited. Therefore, caption generation tasks will inevitably fall into dilemma of few-shot, resulting poor qualities generated text descriptions. In this study, we propose a self-learning method named SFRC for few-shot captioning. Without relying on additional labeled external knowledge, improves performance by ameliorating way efficiency learning limited data. We first train an encoder semantic feature extraction using supplemental modified BYOL self-supervised small number unlabeled samples, where derived from samples. When model self-ensemble yields parameter-averaging teacher based integration intermediate morphologies over certain time horizon. The self-distillation uses self-ensemble-obtained generate pseudo labels guide student next achieve better performance. Additionally, when optimizing parameter back-propagation, design baseline incorporating self-critical reduce variance during gradient computation weaken effect overfitting. range experiments only evaluation metric scores exceed those recent methods. conduct percentage sampling test captioning with fewer also ablation key designs SFRC. results prove that these sparse sample indeed fruitful, each contributes method.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Self-taught Learning for Remote Sensing Image Classification

This paper addresses the land cover classification task for remote sensing images by deep self-taught learning. Our selftaught learning approach learns suitable feature representations of the input data using sparse representation and undercomplete dictionary learning. We propose a deep learning framework which extracts representations in multiple layers and use the output of the deepest layer ...

متن کامل

Contrastive Learning for Image Captioning

Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learn...

متن کامل

Few-shot Learning

Though deep neural networks have shown great success in the large data domain, they generally perform poorly on few-shot learning tasks, where a classifier has to quickly generalize after seeing very few examples from each class. The general belief is that gradient-based optimization in high capacity classifiers requires many iterative steps over many examples to perform well. Here, we propose ...

متن کامل

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...

متن کامل

Prototypical Networks for Few-shot Learning

A recent approach to few-shot classification called matching networks has demonstrated the benefits of coupling metric learning with a training procedure that mimics test. This approach relies on an attention scheme that forms a distribution over all points in the support set, scaling poorly with its size. We propose a more streamlined approach, prototypical networks, that learns a metric space...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Remote Sensing

سال: 2022

ISSN: ['2315-4632', '2315-4675']

DOI: https://doi.org/10.3390/rs14184606